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Large deviations principle for the Adaptive Multilevel 
Splitting Algorithm in an idealized setting 

Charles-Edouard Brehier * 


Abstract 

The Adaptive Multilevel Splitting (AMS) algorithm is a powerful and versatile method 
for the simulation of rare events. It is based on an interacting (via a mutation-selection 
procedure) system of replicas, and depends on two integer parameters: n G N* the size 
of the system and the number k G {1,... ,n — 1} oi the replicas that are eliminated and 
resampled at each iteration. 

In an idealized setting, we analyze the performance of this algorithm in terms of a Large 
Deviations Principle when n goes to infinity, for the estimation of the (small) probability 
P(V > a) where a is a given threshold and X is real-valued random variable. The proof 
uses the technique introduced in [BLR15] : in order to study the log-Laplace transform, 
we rely on an auxiliary functional equation. 

Such Large Deviations Principle results are potentially useful to study the algorithm 
beyond the idealized setting, in particular to compute rare transitions probabilities for 
complex high-dimensional stochastic processes. 
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1 Introduction 

In many problems from engineering, biology, chemistry, physics or finance, rare events are 
often critical and have a huge impact on the phenomena which are studied. From a general 
mathematical perspective, we may consider the following situation: let where T = N 

or M, be a (discrete or continuous in time) stochastic process, taking values in Assume 
that A,B C are two metastable regions: starting from a neighborhood of A (resp. of 
B), the probability that the process reaches B (resp. A) before hitting A (resp. B) is very 
small (typically, less than 10“^^). As a consequence, a direct numerical Monte-Carlo with an 
ensemble of size N does not provide significant results when N is reasonably large (typically, 
less than 10^^) in real-life applications. 

Even if theoretical asymptotic expansions on quantities of interest are available - such as the 
Kramers-Arrhenius law given for instance by the Freidlin-Wentzell Large Deviations Theory 
or Potential Theory for the exit problem of a diffusion process in the small noise regime - in 
practice their explicit computation is not possible (for instance when the dimension is large) 
and numerical simulations are unavoidable. 
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It is thus essential to propose efficient and general methods, and to rigorously study their 
consistency and efficiency properties. Two main families of methods have been introduced in 
the 1950’s and studied extensively since then, in order to improve the Monte-Carlo simulation 
algorithms, in particular for rare events: importance sampling and importance splitting (see 
for instance |AG07| , |RT09| for general reviews of these methods and [KH51] for the historical 
introduction of importance splitting). The main difference between these two methods is 
the following: the first one is intrusive, meaning that the dynamics of the stochastic process 
(more generally, the distribution of the random variable of interest) is modified so that the 
probability that the event of interest increases and in a Monte-Carlo simulation it is realized 
more often, while the second is not intrusive and can thus be used more directly for complex 
problems. Instead, for importance splitting strategies, the state space is decomposed as a 
nested sequence of regions which are visited sequentially and more easily by an interacting 
system of replicas. 

In this paper, we focus on an importance splitting strategy which is known as the Multilevel 
Splitting approach and describe it in the following setting. Let h : —)■ M be a given function 

and assume we want to estimate the probability p = P(X > a) that a real-valued random 
variable X = h{Y) (where T is a M'^-valued random variable) belongs to (a, -|-oo) for a given 
threshold a G M. This situation is not restrictive for many applications; indeed, we may take 
X = and any a G (0,1) in the situation described above, where ta and tb are the 

hitting times of A and B by the process X. A key assumption on the distribution of X is the 
following: we assume that the cumulative distribution function F oi X - i.e. F{x) = P(X < x) 
for any x G M - is continuous; for convenience, we also assume that F{0) = 0 - i.e. X > 0 
almost surely. 

The multilevel splitting approach (see |KH51| . |GHSZ99] . |GDMFG12) for instance) is 
based on the following decomposition of p as a telescopic product of conditional probabilities: 

N 

p = P(X>a) = ]JP(X>a,|X>a,_i), (1) 

where ao = 0<ai<...<a7v = aisa sequence of non-decreasing i.e. levels. In other words, 
the realization of the event {X > a} is split into the realizations of the N events {X > Uj} 
conditional on {X > aj_i}; each event has a larger probability than the initial one and is 
thus much easier to realize. Then each of the conditional probabilities is estimated separately, 
for instance with independent Monte-Carlo simulations, or using a Sequential Monte-Carlo 
technique with a splitting of successful trajectories. This approach have been studied with 
different viewpoints and variants under different names in the literature - nested sampling 
|Skin6| . |Ski07) . subset simulation |ABni) . RESTART, |VAVA91) . |VAVA94| . 

For future reference, we introduce the following (unbiased) estimator of p given by the 
multilevel splitting approach with N levels and n replicas: 


N 


Pn 


2=1 m=l 


- 


( 2 ) 


where the random variables {Xm)Km<N,i<i<N ai'e independent and the distribution of Xm 
is C{X\X > Oj-i). Thus p^ is a product of N independent Monte-Carlo estimators of the 
conditional probabilities in ([1]). 
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The efficiency of the algorithm depends crucially on the choice of the sequence of levels 
io-i)i<i<N- for a fixed size N, the variance of the estimator is minimized when the conditional 
probabilities are equal {to moreover the associated variance converges (to —p^ log{p)/n) 

when N goes to infinity - see for instance [CDMFG12) for more details. 

To get a more flexible algorithms, a possible approach is to compute levels adaptively, 
as proposed in |CG07| . and studied extensively in the last years, see for instance |BLR15| . 
|BGT14| . |GG14) . |GHMLlT| . |Siml4| . |Wall4) . It is essential to check that these adaptive 
versions still give reliable results, and to prove they do it efficiently. 

More precisely, we consider the Algorithm 12.21 defined below, which depends on two pa¬ 
rameters n and k, with the condition 1 < k < n — 1. We let evolve a system of n interacting 
replicas, and at each iteration a selection-mutation procedure leads to resample the system as 
follows: we compute the k-th order statistic Z - which corresponds to the so-called level at the 
given iteration - of the system and eliminate the k replicas with values less than Z] they are 
then resampled using the conditional distribution C{X\X > Z) of X conditional on {X > Z}. 
The algorithm stops when Z > a, and we define an estimator depending on the number 
of iterations and of the terminal configuration of the system of replicas, see ([5]). In practice, 
we require to be able to sample according to the conditional distribution C{X\X > z) for 
any value of z: this is part of the idealized setting assumption; even if it is rarely satisfied 
in real-life applications, the study of the algorithm in that setting is already challenging and 
yields very interesting results, that can usually be generalized beyond this simplified case at 
the price of a much more intricate analysis. 

Let us recall a few fundamental results. In |GHMLlT| (see also |Siml4| . |Wall4) i. it was 
proved that for any value of n > 2 then p^'^ is an unbiased estimator of p - meaning that 
gjpn,!] _ This result was extend to general l<A:<n—lin [BLRlSj . Efficiency properties 
have been studied with the proof of Central Limit Theorems in two different kinds of regimes: 
either k is fixed and n —^ -|-oo (see |BGT14) as well as |GHML11| and |Siml4| when A: = 1), 
or both k and n go to infinity, in such a way that k/n converges to a G (0,1) - which gives a 
fixed proportion of resampled replicas at each iteration, see |CG07) and the more recent work 
|CG14) in a very general framework. 

The efficiency is ensured by the observation that the asymptotic variance is the same for 
both the adaptive and the non-adaptive versions. Moreover, it is much smaller than when 
using a crude Monte-Carlo estimator, i.e. the empirical average 

1 " 

= (3) 

1 

m=l 

where the random variables (Xm)i<m<n are independent and identically distributed, with 
distribution C{X). 

In this paper, we prove a similar result with a different criterion, which seems to be 
original compared with existing literature: we prove a Large Deviations Principle principle 
for the distribution of the estimator given by the adaptive algorithm when k is fixed and 
n —>■ -|-oo. Our main result is Theorem ixn which in particular yields for any given e > 0 

— log^P(|p"'’^ — p| > e)^ — min(/(p-I-e),/(p — e)) < 0. 

The rate function I - see © - obtained in Theorem ED does not depend on k. We then 
compare this rate function with I - see (1231) - the rate function obtained for a crude Monte- 
Carlo estimator p„ given by ([3]) (thanks to Cramer Theorem, see |DZir)) i and show that for 
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any y £ (0,1) \p we have I{y) > X(y) - we have X(p) = I{p) = 0, and X(y) = I{y) = +oo if 
y ^ (0,1) - and thus 


F{p"'’^ — p> e) 
— P > ^) 


In other words, for large n, the probability that deviates from p from above (and similarly 
from below) with threshold e > 0 decreases exponentially fast, at a faster rate than for 

Moreover, we prove that the non-adaptive, fixed-levels estimator p^ satishes a Large Devi¬ 
ations Principle when n —)• -|-oo with rate function Ijsf for a hxed number of levels N and when 
the levels are chosen in an optimal way, namely such that P(X > ai\X > aj_i) = p^/^ does 
not depend on i. We then show that limjv^>-i-ooXAr(y) < I{y) for any ?/ G M: this inequality 
is sufficient to prove that asymptotically the adaptive algorithm performs (at least) as well as 
the non-adaptive version in this setting, in terms of Large Deviations. 

The proof of Theorem 13.II relies on the technique introduced in |BLR,15) . First, we restrict 
the study of the properties of the algorithm to the case when X is exponentially distributed 
with oarameter 1 (this kev remark was introduced hrst in |GHML11] and used also in [Sim 
|Wall4j L Instead of working on directly, we focus on its logarithm log(p"'’^), and prove 
that when considering the algorithm as depending on an initial condition x, the Laplace 
transform of the latter is solution of a functional (integral) equation (with respect to the x 
variable) - thanks to a decomposition of the realizations of the algorithm according to the value 
of the hrst level. To study the equation in the asymptotic regime considered in this paper, 
we then derive a linear ordinary differential equation of order k and perform an asymptotic 
expansion. Note that we do not give all details for the derivation of the differential equations 
and the basic properties of its coefficients; for some points we refer the reader to |BLR15| 
where all the arguments are proved with details and here we mainly focus on the proof of the 
new asymptotic results as well as on the interpretation of the Large Deviations Principle for 
our purpose. 

It seems that studying the performance of multilevel splitting algorithms via Large De¬ 
viations Principle is an original approach, which can complement the more classical studies 
which are all based on Central Limit Theorems. In this paper, we proved a result in a specific 
regime {k is fixed, n —)• -|-oo) in the idealized setting. To go further, it would be interesting 
to look at other regimes {k,n —)• -|-oo with k/n —)■ a G (0,1)) and to go beyond the idealized 
setting. This will be the subject of future investigation. 

The paper is organized as follows. In Section [2l we introduce our main assumptions 
fSection 12.ip . describe the Adaptive Multilevel Splitting algorithm fSection 12.2p and recall 
several of its fundamental properties used in the sequel of the article fSection 12.311 . The main 
result of this paper is given in Section [3l it is the Large Deviations Principle for the estimator 
of the probability given by the AMS estimator, see Theorem 13.11 An important auxiliary result 
is stated in Section 01 and proofs are carried over in Section [5]- some technical estimates being 
proved in Section [71 We compare the performance in terms of the Large Deviations Principle 
of the AMS algorithm with two other methods in Section [6l a crude Monte-Carlo method and 
a hxed-level splitting method. Finally, we give some concluding remarks and perspectives in 
Section [HI 
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2 Description of the Adaptive Multilevel Splitting algorithm 

2.1 Assumptions 

Let X be some real random variable. For simplicity, we assume that X > 0 almost surely. 

We want to estimate the probability p = P(X > a), where a > 0 is some threshold. When 
a goes to + 00 , p goes to 0 and we have to estimate the probability of a rare event. 

We make a fundamental assumption on the distribution of X. 

Assumption 2.1. Let F denote the cumulative distribution function of X: we assume that 
F is continuous. 

More generally, for both theoretical and practical purpose, we introduce for 0 < a; < a the 
conditional probability 

P{x) = ¥{X > o|X > x); (4) 

we also denote by C[^X\X > x) the associated conditional distribution, and F{-;x) its cumu¬ 
lative distribution function: for any y > x we have F(y;x) = whenever F{x) < 1. 

We notice two important equalities: P{a) = 1, and the estimated probability is p = P(0); 
in fact, the distribution of X is equal to £(X|X > 0). 

The idealized setting refers to the following assumptions: 

• Assumption 12 .1 1 is satished {theoretical condition)] 

• it is possible to sample according to the conditional distribution C{X\X > x) for any 
X € [0,a) {practical condition). 

In view of a practical implementation of the algorithm, the second condition is probably 
the most restrictive. One may rely on some approximation of the conditional distribution 
C{X\X > x) thanks to a Metropolis-Hastings algorithm: in that case (see |CG14) for instance), 
the analysis we develop here does not apply, but gives an interesting insight for the behavior 
in the case of a large number of steps in the Metropolis-Hastings auxiliary scheme (rigorously, 
we treat the case of an infinite number of steps). 

2.2 The algorithm 

We now present the Adaptive Multilevel Splitting algorithm, under the assumptions of Section 
ED above. 

The algorithm depends on two parameters: 

• the number of replicas n; 

• the number A: G {1,..., n — 1} of replicas that are resampled at each iteration. 

The other necessary parameters are the initial condition x and the stopping threshold a: 
the aim is to estimate the conditional probability P{x) introduced in (jj]). For future reference, 
we denote by AMS(n,/c; a, x) the algorithm. 

The dependence with respect to x allows us below to state fundamental functional equa¬ 
tions on useful observables of the estimator computed at the end of the iterations of the 
algorithm, as a function of x. In practice, we are interested in the case x = 0; in this situa¬ 
tion, the algorithm is denoted by AMS(n, fc;a). 
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Before we detail the algorithm, we introduce important notation. First, when we consider 
a random variable the subscript i denotes the index in {1,... ,n} of a replica, while the 
superscript j denotes the iteration of the algorithm. 

Moreover, we use the following notation for order statistics. Let Y = (Yi,... ,1^) be in¬ 
dependent and identically distributed (i.i.d.) real valued random variables with continuous 
cumulative distribution function; then there exists almost surely a unique (random) permuta¬ 
tion a of {1,..., n} such that < ... < For any k G {1,..., n}, we then denote by 

Y(;j) = the so-called fc-th order statistic of the sample Y. Sometimes we need to specify 

the size of the sample of which we consider the order statistics: we then use the notation 

^{k,n)- 

We are now in position to write the AM.S{n, k; a, x) algorithm. 

Algorithm 2.2 (Adaptive Multilevel Splitting, AMS(n, A:; o, x)). 

Initialization: Set the initial level = x. 

Sample n i.i.d. realizations ..., with distribution C{X\X > x). 

Define = X^j^y the k-th order statistics of the sample X^ = (A)^,..., A)(), and the 
(a.s.) unique associated permutation: < ... < 

Set j = 1. 

Iterations (on j > 1): While Z^ < a: 

• Conditional on Zfi sample k new independent random variables (Yj^,..., Y^), according 
to the law Y(A|A > Z^). 

• Set 

* \xi-^ if {a^)-\i) > k. 

In other words, we resample exactly k out of the n replicas, namely those with index 
i such that Xi~^ < Zfi i.e. such that i G |cj'^(l),... , (T'^(fc)} (which is equivalent to 
{a^)~^{i) < k). They are resampled according the the conditional distribution C{X\X > 
Z^). The other replicas are not modified. 

• Define Z^~^^ = A^^^ the k-th order statistics of the sample X^ = (Xf,...,Xf), and 

the (a.s.) unique associated permutation: < ... < 

• Finally increment j ■(— j + 1. 

End of the algorithm: Define J'^’^{x) = j — 1 as the (random) number of iterations. Notice 
that J^'^{x) is such that < a and 

Notice for instance that J^’^{x) = 0 if and only if Z^ > a: we mean that in this case the 
algorithm has required 0 iteration, since the stopping condition at the beginning of the loop 
(on j) is satisfied without entering into the loop. 

The estimator of the probability P(x) is defined by 

= C^’\x) (l - , (5) 
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with 


= ^Cardji; > a} . 


( 6 ) 


The interpretation of the factor is the following: it is the proportion of the replicas 

which satisfy xj > a: since 
ice that C'"'’^(x) = 1. 

When X = 0, to simplify notations we set = p"'’^(0). 


which satisfy X^ > a: since have C^’^{x) > 

Notice that C'"'’^(x) = 1. 


2.3 Properties of the AMS Algorithm 12.21 
Well-posedness 

We first recall some important results on the well-posedness of the algorithm. For more 
detailed statements and complete proofs, see Section 3.2 in |BLR15| . in particular Proposition 
3.2 there. 

First, at each iteration j of the algorithm, conditional on the level , the resampling 
produces a family of n random variables which are independent and identically 

distributed, with distribution C{X\X > Z^). By Assumption 12.11 conditional on Z^ the latter 
conditional distribution also admits a continuous cumulative distribution function F[-]Z^)] 
as a consequence, almost surely the permutation is unique, and the level Z^^^ is well- 

defined. 

Moreover, if we assume that P{x) > 0, almost surely the algorithm stops after a finite 
number of steps, for any values of k and n such that 1 < A: < n — 1: the random variable 
jn,k^x) almost surely takes values in N, and the estimator p^’^{x) is well-defined and takes 
values in (0,1]. 


Reduction to the exponential case 

We now state properties that are essential for our theoretical study of the algorithm below. 

One of the main tools in |BLR15| and |BGT14| . which was also used in [ORMLlT] in 
the case A: = 1, is the restriction to the case where the random variables are exponentially 
distributed. More precisely, assume that P{x) > 0, and denote by f(l) the exponential 
distribution with mean 1. Then in distribution the algorithm AMS(n,A;;a) is equal to the 
algorithm AMSexpo(?T') k] — log(p)) in which we assume that the distribution is f(l); a similar 
result holds for AMS(n, A:; a, x) when x € [0,a). In particular, the associated estimators are 
equally distributed. The main argument is the well-known equality of distribution F{X) = U 
where U is uniformly distributed on (0,1). 

In the sequel, we state in Section[3]our results in the general setting - i.e. for AMS(n, k] a), 
with the probability p and the estimator - but in the remaining of the paper we give proofs 
in the exponential case, namely for AMSexpo(^) k; aexpo,x) with Oexpo = ~ log(p), and we omit 
the reference to the exponential case to simplify the notation. Whether we consider the general 
or the exponential case will be clear from the context. 


3 The Large Deviations Principle result for the AMS algorithm 

The main result of this article is the following Theorem 13.11 which states a Large Deviations 
Principle (in the sense of |DZir)| l for the distribution ofp”’^ for fixed probability 
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p > 0 and k G N*, in the limit n —>■ +oo. 


Theorem 3.1. Assume that p G (0,1) and k G N* are fixed. Then the sequence n>k 

of distributions of the estimator p^’^ of p obtained by the AMS(n, A:;a) algorithm satisfies a 
Large Deviations Principle with the rate function I defined by 


i{y) 


+00 ify^ ( 0 , 1 ) 

log(y)log(mg)+ log(p ifyG{0,l). 


(7) 


We observe that the rate function does not depend on k. 

Notice that the statement above is restricted to p G (0,1). Indeed, when p = 1, we have 
almost surely p”’^ = 1 (the algorithm stops after 0 iteration). Moreover, we always estimate 
the probability of events which have a positive probability (otherwise the algorithm does not 
stop after a finite number of iterations). 

The following Proposition describes some properties of the rate function I. 


Proposition 3.2. The rate function I is of class C°° on its domain (0,1). 

Moreover, p is the unique minimizer of I: we have I{p) = I'{p) = 0, I"{p) = log(p) > 0- 
Finally, for any y G (0, l)\{p} we have I[y) > 0; I is decreasing on (0,p) and is increasing 
on (p, 1). 

Proof. Straightforward computations yield that for y G (0,1) we have 

dljy) ^ log (log (p)) - log (log (y)) 
dy y 

d^Ijy) ^ log (log (p)) - log (log (^)) _ 1 

dy"^ y2 log(p) ■ 


□ 


Let e G (0, max(p, 1 — p)); then from Theorem 13.II we have when n —)■ +oo 

— log(^P(|p"'’^ — p| > e)^ — min(/(p + e),/(p — e)) < 0. (8) 

Applying the Borel-Cantelli Lemma, we get the almost sure convergence p”’^ —>■ p. 

Remark 3.3. The almost sure limit is consistent with the unbiasedness result (E\p^’^] = p) 
from fBLB.L^ . There we were only able to prove the convergence in probability of to p. 
Notice also that in IBGTlfij we proved a Central Limit Theorem: 

-p) —^ A7(0,-p^ log(p)). 

The asymptotic variance is given by I''{p). 

We conclude this section with a result showing that the choice of the regime p (and k) 
fixed and n —)• +oo is crucial to get Theorem 13.11 Indeed, set k = 1, and for a given cr > 0 
assume that n and p are related though the following formula: — log(p) = a‘^n. Then 
converges (in law) to a log-normal distribution, as stated in the following proposition. 
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Proposition 3.4. If — log{p) = a'^n, we have the convergence in distribution 

lim -= exp(cjZ — l‘£)^ 

n—>-oo p 

where Z ~ AA(0,1). 

The proof is postponed to Section 15.11 since it uses the same arguments as the proof of 
Theorem 13.11 in the case k = 1. 

Let e > 0. Then (compare with ([8|) with ep instead of e) 

P(|i^- l|>e) ^ P 2 ~Ar(o,i)(|exp((jZ-(jV 2 )-l| >e)) >0, 

P „=_i 26 W_^_,_oo '' 

where the limit is positive, while owing to ([8]) when p fixed, -1| > e) converges to 0 

exponentially fast when n —>■ +oo. 


4 Strategy of the proof 

To prove Theorem l3.1[ we in fact first prove a Large Deviations Principle for = /i(log(p"'’^)), 
with rate function J given below. 

Proposition 4.1. Assume thatp G (0,1) andk G N* are fixed. Then the sequence n>k 

of distributions o/log(p"'’*^) obtained by the AMS(n, k] a) algorithm satisfies a Large Deviations 
Principle with the rate function J defined by 


J{z) 


+00 if z > 0 

z - log(p) - z log(i^) ifz<0. 


(9) 


Then Theorem 13.11 immediately follows from Proposition 14.11 and the application of the 
contraction principle (see iMin], Theorem 4.2.1): we have p^'^ = exp(log(p”’^)), and we 
obtain the rate function with the identity I{y) = J(log(?/)). 

The proof of Proposition 14.11 relies on the use of the Gartner-Ellis Theorem (see Theorem 
2.3.6 in [DZlOj i and the asymptotic analysis when n +oo of the log-Laplace transform of 


Proposition 4.2. Set for any 1 < k < n — 1 and any A G 


An,fc(A) =log(^E exp(Alog(p”A)) 


Then for any fixed A: G N* and any A G M we have the convergence 

1 


n 


An,fc(nA) A(A) = - log(p)(exp(-A) - 1). 


The Fenchel-Legendre transform A* of A satisfies: 

A*( 2 :) = sup(Az — A(A)) 
AgK 


+00 if z > 0 

z-logip)- z\og{j^) ifz<0. 


( 10 ) 


( 11 ) 


( 12 ) 
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Then for any k G N*, the sequence of distributions satisfies a Large Devia¬ 

tions Principle, with the rate function J = A*. 

The proof of (llip is the main task of this paper. In Section [5Tl we give a first easy proof in 
the case k = 1, relying on the knowledge of the distribution of it is a Poisson distribution 
with mean —nlog(p). We can then compute explicitly A„^i(A) and prove (llip . In Section [5.21 
we study the general case k > 1 with the method introduced in [BLR15| . in the exponential 
case: for the algorithm AMSexpo(raj we derive a functional equation on the Laplace 

transform exp(A„ ^(A) as a function of the initial condition x, for fixed parameter A. 

For completeness, we close this Section with the computation of the Fenchel-Legendre 
transform J = A* of A in Proposition 14.21 

Proof. First, assume that z > 0. Then Aa: — A(A) ^ -|-oo when A —>• -|-oo: thus A*{z) = -|-oo. 
Notice that this result is not surprising, since log(p”’^) < 0 almost surely. 

If z < 0, the map A G M e-)• Az — A(A) admits the limit —oo for z —)• ±oo, and attains 
its maximum at the unique solution A^ of the equation z — = 0, which is given by 

A^ = - log(j 3 ^). Then A*( 2 ;) = A^z - A(A^), which gives ([l2]). □ 

5 Proof of Proposition 14.2 
5.1 The case k = 1 

We start with a proof of Theorem 13.11 when k = 1: in this case, we have = 1 almost 
surely, and the number of iterations follows a Poisson distribution P(—nlog(p)) (see for 
instance |PLR.15| . [(IHMLnj L 

As a consequence, it is very easy to prove Proposition 14.11 Let A G M. Then 

Ln,i(A) = exp (A„ 4 (A)) 

= E[exp(Alog(p”’^))] 

= E[exp(Alog(l — l/n)J"'’^)] 

= exp^—n log(p) (exp(A log(l — 1/n)) — l)^ 

It is now easy to conclude: when n —)• -|-oo 

— log (A(reA)) = — log(p)(exp(reAlog(l — 1/n)) — l) 

^ -log(p)(exp(-A) - 1). 

n^+oo 


We have performed explicit calculations, using the knowledge of the distribution of 
However for k > 1, we cannot rely on such simple arguments and we need other tools. 

We would like to use the connexion with the Poisson distribution in order to give an 
interpretation of the rate functions I and J. More precisely, I is obtained from J by the 
contraction principle {I{y) = J(log(y))), and J is the rate function obtained in the Cramer 
theorem where the distribution R is such that —R P(-log(p)). Indeed, let (Rm)meN* be 
independent, with the same distribution as W; if we denote by R^ = ^ the empirical 
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average, we compute for any A G M 


E[exp(nAi?ri,)] = ^]E[exp(Ai?)^ 

= (exp(-log(p)(exp(-A) - l))) . 


To conclude this section on the case k = 1, we prove Proposition 13.41 We use again the 
explicit knowledge of the distribution of and use a Central Limit Theorem on exponential 
distributions to conclude. 


Proof of Proposition\3.4\ We write (with a = — log(p) = u^n) 


P 


I?!,! 


= exp(J’^’^ log(l — 1/n) + a) 


P 


= exp 


jn,l _ 


na 


na log(l — 1/n) + a + na ln(l — 1/n) 


By the Central Limit Theorem on the Poisson distribution, one gets, in the limit n —+oo, 
the following convergence in distribution 


jn,l _ 

y/na 


W(0,1). 


Moreover, when n —+oo, we have y/na\og{l — 1/n) = ncrlog(l — 1/n) —)• —a and a + 
nolog(l — 1/n) = cr^ (n + n^ln(l — 1/n)) tends to —f-- This concludes the proof. 

□ 


5.2 The general case 

In this section, we give the main arguments used to prove Proposition 14.21 in the general case 
A; G N*. In particular, we want to show that the rate function we obtain does not depend on 
k. The proof of some important but technical results is postponed to Section [T] 

Even if in Section [Q above we have proved Proposition 14.21 in the case k = 1, we include 
this case in our general framework, and obtain an alternative proof. 

To this aim, we make use of the strategy introduced in |BLR15| to study the proper¬ 
ties of the AMS(n, A:;a) algorithm. First, as explained in Section 12.31 we are allowed to 
restrict the study to the case where X is exponentially distributed: it is enough to study the 
AMSexpo(n.,/c;aexpo) algorithm, where Uexpo = -log(p). 

Moreover, one of the main ideas is to consider the initial condition of the algorithm as 
an extra variable: for x G [0, a), we study the AMSexpo(?T') ^5 ^expo) 3:) algorithm. From now 
on, in this Section, and in Section [71 we only consider the exponential case and we omit the 
dependence. 
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Definition 5.1. We use the following notation: for any {x,y) G 


f{y) = exp(-y)lj/>o , F{y) = (l - exp(-?/))ly>o = / f{z)dz-, 

J —OO 

N f(y) . r.: ^ F{y)-F{x)^ fy 

f{y-x) = , F{y-x) == j^^f{z-x)dz- 

fn,k{y;x) = k(^^F{y;x)^~^f{y-x){l - F{y;x)Y~'^, 

ry 

Fn,k{y,x)= fn,k{z-,x)dz. 

J X 

Let X be exponentially distributed with parameter 1. Then f (resp. F)) is the density (resp. 
the e.d.f.) of C{X). For x > 0, f{-',x) (resp. F{-]x)) is the density (resp. the c.d.f.) of the 
conditional distribution C{^X\X > x). 

Finally, let {Xi,... ,X„) he i.i.d. with the distribution of C{X), with the associated order 
statistics < ... < X(„). Then fn,k{-',x) (resp. Fn,k[-\x)) is the density (resp. the e.d.f.) 
of the k-th order statistic X(j^y 

The main object we need to study is the following function Tn^k of A G M (considered as a 
fixed parameter) and the initial condition x G [0, a] 


rn,fc(A, x) 


= E exp(nAlog(p"'’*^(x))) 
= exp(A„^fc(nA;x)). 


(13) 


Notice that we include x = a in the domain of definition of the functions T^ ^ and ^ (defined 
by (linp i. It is also important to remark that we evaluate the latter at (nA;x). 

We state several fundamental results which together yield ProDosition l4.2l in the x-dependent 
case; to get m it is then enough to take x = 0. 

First, Proposition 15.21 gives a functional equation satisfied by rn,fc(A; •) on [0,a], for any 
value of the parameters 1 < A: < n and A G M. 

We use the following auxiliary function: 


/I ± ^ 

©n,fc(A;x) = ^exp(nAlog(l- )) (^Fn/ia; x) - F„^£+i(o; x)^ , (14) 

i=Q 

with the convention Fnfi{y;x) = ty>x- 

Proposition 5.2. For any n G N*, k G {1, ... ,n — 1}, and A G M, the funetion r„^fc(A; •) is 
solution on the interval [0,a] of the funetional equation (with the unknown P ): 

r(x) = y exp(^nAlog(l - ^)^r(y)/„,fc(y;x)(iy + 0„,fe(A;x). (15) 

Notice that for the moment, it is not cleat that P^^^ is the unique solution of the functional 
equation (I15D . We will prove this property below. 

For completeness, we include a proof of this result, even if follows the same lines as Propo¬ 
sition 4.2 in |BLP15| . 
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Proof of Provositon \5.2[ We decompose the expectation according to the value of the (ran¬ 
dom) number of iterations in the algorithm starting from x: 


rn,fc(A;x) = E 

exp (nA log(p” A (x))) 




= E 

exp{nXlog{p^’’^{x)))ljr.,k(^)^o 

-hE 



First, since { = O} = > a} = Uf=o > o > ^(£)}) we have 


E 


exp (nA log(p"’^(x))) 1 


E Jexp(reA log(C”’^(x))) 1 jn,fc(,,)=o 

k-l ^ 

= ^exp(^nAlog(l- (^Fn/{a-,x) - Fn/+i{a;x)^ 

£=Q 

— 0n,fc('^) ^)' 

Second, we use | J"’^(a;) > l} = {Z^ < a} and condition with respect to Z^: 


E 


exp( 


(nAlog(p"’^(x)))ljn,;=(,^)>i =E E[exp(nAlog(p”’^(a:)))|Z^]l 


*-Zl<Q 


= E 
= exp 
= exp 

-f 


E[exp(nAlog((l — -|- reAlog(l — k/n)) \Z^]lzi<:a 

E[E[exp(nAlog((l-fe/n)^’^’''(^^)C"’'=(Zi)))|zi]l^i<„ 


(nXlog{l --fjE Tn,k{Z^;x)lzi<a 

(^nAlog(l - ^)^Tn,k{X;y)fn,kiy]x)dy. 


exp 


We have used a kind of Markov property for the algorithm: up to taking into account for one 
more iteration, the algorithm behaves the same starting from x or from Z^ G {x,a]. □ 


Notice that the functional equation (I15D involves a simple factor depending only on A, 
n and k in the integral, and that on both the left and the right-hand sides the function F 
is evaluated at the same value of the parameter A. These observations are consequences of 
the choice to prove a Large Deviations Principle for log(p"'’^) (instead of thanks to the 
Gartner-Ellis Theorem, and to conclude with the use of the contraction principle; the same 
trick was used in |BGT14| to prove the Central Limit Theorem, thanks to the delta-method 
and the use of Levy Theorem. If one replaces log(p"'’^(x)) with p”’^(x) in (|13l) . then one 
obtains a more complicated functional equation where the observations above do not hold, 
and which is not easily exploitable. In particular, one does not obtain a nice counterpart of 
the fundamental result. Proposition 15.31 below. 

We now state in Proposition 15.31 that solutions P of the functional equation (IlSp are in 
fact solutions of a linear Ordinary Differential Equation (ODE) of order k, with constant 
coefficients. 


Proposition 5.3. For any n G N*, k G {1,..., n — 1}, and A G M, let F be a solution of the 
functional equation (USD. Then it is solution of the following linear ODE of order k: 


^Fn,k{^]x) = exp(^nAlog(l - ^))/i"’^rn,fe(A; x) ^ rf^^-^Vn,k{^]x). (16) 


k-l 


m=D 
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The coefficients and (rm^)o<m<A:-i satisfy the following properties: 

fc-i 

i/™ = (u — n)... {n — n + k — 1) for all £ M. 

m=0 


( 17 ) 


A sketch of proof of this result is postponed to Section [71 It uses the same arguments as to 
prove the corresponding functional equation in |BLR15| . For the proof of (I17p in particular, 
we refer to that article. 

To conclude on uniqueness of the solution of (1151) . and then prove asymptotic expansions 
on fc, we prove the following Lemma. 

Lemma 5.4. For any fixed k G {1,.. ., } and any A G M, we have for any m G {0,..., A: — 1} 


dxm > 


x=a 


= ^Qn,fc(A;x) 

x=a 

^ n™(l — exp(—A))™. 


(18) 


By Cauchy-Lipschitz theory, the linear ODE f|16p with the conditions (|18p at x = a admits 
a unique solution; therefore it is clear that F„ ^ is the unique solution of (I15p . 


Remark 5.5. To prove the Central Limit Theorem in \BGT14^ , we used a similar result 
although in a weaker form: we only needed to prove ^^7?r0n,fe(A; x) = 0(n™’). Here we 


x=a 

require a more precise asymptotic result in order to prove that the coefficient 7 ^;. (A) defined 
in Provosition \5.6\ below converges to 1 (in fact, we only need that it is bounded from below by 
a positive constant). 


We finally explain how to obtain asymptotic knowledge on F„^fc(A; x) and An^k{n\, x) when 
n —>• + 00 . First, the k roots k^^)) i<i<k polynomial equation associated with the 

linear ODE (|16p are pairwise distinct for n large enough (the other parameters A and k being 
fixed), and more precisely they satisfy (I20p . As a consequence, the solution Tn^k can be written 

(see din])) as a linear combination of exponential functions x i—>■ exp^z/^ ^(A) (x — a)^ . Finally, 
using the asymptotic expression for the derivatives of order 0,..., /c — 1 at x = a, we obtain 
a linear system of equations, solve it using the Cramer’s formulae and obtain the asymptotic 
expression (1211) . The proof is postponed to Section [71 


Proposition 5.6. Let k G {1,... , } and A G M &e fixed. Then for n large enough, we have for 
any x G [0, a] 


K 

Fn,fc(A,x) = ^7^^fc(A)exp(^z/^fc(A) (x - a)), 
£=1 


where 

'"Ifc(A) ~ n(l - 

and 

lh,kW lt=i- 
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We now conclude and prove Proposition 14.11 namely the Large Deviations Principle for 
(^(log(P"’^)))n>fc- 

We start with the case A: > 1. Then for any £ S {2,..., A:} we have for any A £ M 

Re(l - > Re(l - 


As a consequence, for x < a we have when n —)• +oo 


= Q^gl-exp(-A))(x-a)^ ^ 


and thus 

-A„ fc(?^A;x) =-log(r„,fc(A;x)) ~ r'^fc(A)(x-o) ^ (1 - e"^)(x - a). 

n n n—>+cx) ’ n—>^+oo 

When A; = 1, the linear ODE (I16p is of order 1, and it is easy to check that 

r„,i(A;x) = exp(^i/^ fc(A)(x - a)^, 

so that the same asymptotic result as above holds. 

It remains to take x = a, and to recall that a = — log(p) if p = P(X > a) and X is 
exponentially distributed with parameter 1. 

This concludes the proof of Proposition 14.11 


6 Comparison with other algorithms 

We propose a comparison (in terms of large deviations) of the Adaptive Multilevel Splitting 
algorithm with the two other methods described in the Introduction: a direct, naive Monte- 
Carlo method, based on a non-interacting system of replicas with the same size (see the 
estimator ©) , and a non-adaptive version of multilevel splitting (see the estimator ([2])). 

In the first case, we obtain that large deviations are much less likely for the AMS algorithm 
than for the crude Monte-Carlo method. In the second case, we show that the AMS estimator 
is more efficient than the non-adaptive one taken in the limit of a large number N of fixed 
levels. 

These results are consistent with the cost analysis and the comparison based on the central 
limit theorem, see |BLR15| . |BCT14| . |CDMFC12| . |CC14) . 

6.1 Crude Monte-Carlo 

We compare the performance of the AMS algorithm with the use of a Crude Monte-Carlo 
estimation in the large n limit. 

Let (Alm)mGN* a sequence of independent and identically distributed random variables, 
each one being equal in law with X. 

Then for any n G N* 

1 "" 

( 22 ) 

1 

m=l 

is an unbiased estimator of p. 
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It is a classical result (Theorem 2.2.3 in |DZ10| 1 due to Cramer that the sequence 
satisfies a Large Deviations Principle with the rate function (case of Bernoulli random vari¬ 
ables, see Exercice 2.2.23 in |DZin) l: 


Ay) 


-boo liy i (0,1) 

ylog(^)+(i-y)log(^) if ye (0,1). 


The comparison between the algorithms is based on the following result: 


Proposition 6.1. For any p E (0,1) and any y E (0,1), we have 


i{y)>Ay), 

I{y) = Ay) if and only if y = p. 


(23) 


Proof. We explicitly mention the dependence of I and of I with respect to p, and we define 


D{y,p) = I{y,p) -l{y,p). 


It is clear that D{p,p) = 0 for any p E (0,1). We compute that 

dD{y,p) ^ 1-y / log(y) _ log(p) y 
dp p log(p) Vl — y 1—y/’ 

since the function t e-)• is strictly decreasing on (0,1) (as can be seen by computing its 

first and second order derivatives), we see that for any y,y E (0,1)^ we have the inequalities 


dD{y,p) 

dp 


> 0 if y > p and 


dD{y,p) 

dp 


< 0 if y < p. 


Using D{p,p) = 0, it is easy to conclude. 


□ 


Now let e E (0,max(y, 1 — y)); then for n large we have 


P(y”'’^ — y > e) 
APn -P> () 



0 , 


exponentially fast, since we have by the Large Deviations Principles A(e,n) —)• T{p + e) — 
I{p + e) <0 when n —>■ -boo (notice that both I and I are increasing on (y, 1)). 

The same arguments apply to get 

P(y"’’^ — y < —e) 

^ —>■ 0 . 

"^{Pn - P < -e) 

As a consequence, the probability of observing large deviations from the mean y is much 
smaller for the AMS algorithm than when using a crude Monte-Carlo estimator, in the large 
n limit. This statement is a new way of expressing the efficiency of the AMS algorithm. 

Notice that in the discussion above we have not assumed that we are estimating a proba¬ 
bility in a rare event regime: the conclusion holds for any y E (0,1). Now it is also instructive 
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to compare /((I + e)p) and X((l + e)p) for a given e G (0,1) and when p —)• 0: it amounts at 
looking at deviations of the relative error, and we have 


lim — log(P( 

n—>+oo n ' 


P 


n.k 


p 


> e 


lim — log 

n—^+co n 


Pn-P 

P 


P 


> e 


= + e)) ~p^o - 


(log(l + e))' 


-21og(p) 

= + e)) + e)log(l + e) - e). 


Given 5 >0, in order to have a probability lower than 6 that the relative error is larger than 
e, in the small p limit, one thus needs a number of replicas n which scales like 1/p when using 
the crude Monte-Carlo method, while it scales like — log(p) (which is much smaller) when 
using the AMS algorithm. Moreover, since the expected workload is of size n when using the 
Monte-Carlo method and of size —nlog(p) when using the AMS algorithm, it is clear that in 
terms of large deviations from the mean the AMS algorithm is more efficient than the crude 
Monte-Carlo method. 

Notice that this discussion is consistent with the conclusions coming from the Central 
Limit Theorem, where in the regime p —)• 0 the asymptotic variance is equivalent to p when 
using the crude Monte-Carlo method and —log(p) when using the AMS algorithm: to obtain 
reliable confidence intervals on the relative error, the number of replicas n scales in the same 
way. 


6.2 Non-adaptive Multilevel Splitting 

We now compare the rate function I obtained for the Large Deviations Principle on the AMS 
algorithm, with the one we obtain when using a deterministic (non-adaptive) sequence of 
levels. 

Namely, using Assumption 12.11 we decompose the probability as a telescoping product of 
N G'N* conditional probabilities 

N 

p = P(A > a) = ]JP(A > ai|A > ai_i), (24) 

associated with a given non-decreasing sequence of levels oq = 0 < ai < ... < otv = O- We 
denote by = P(A > ai\X > aj_i) the i-ih conditional probability. The sequence is of size 
N and we study the asymptotic regime N —)• -|-oo. 

We can define an unbiased estimator of p as follows: let n G N* and set 

N 

Pn =WPn\ (25) 

i=l 


_f2) 

where \Pn is a family of independent random variables, where each pn is a Crude 

Monte-Carlo estimator (as defined in the section above) for the probability with n real¬ 
izations. More precisely, let independent random variables, such that 

C{xl^) = C{X\X > ai_i), and set 


77 , -^m ><^i 

m=l 


n 


( 26 ) 
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From a practical point of view, notice that the computation of these estimators requires the 
sampling of random variables according to the conditional distribution C{X\X > aj_i) for 
each i S {1,..., A^}, just like for the adaptive version. 

Here n thus denotes the number of replicas used for the estimation of the probabilities 
in both the adaptive and the non-adaptive versions. We needed the extra parameter N to 
denote the number of iterations [i.e. the length of the sequence of levels) of the algorithm, 
while we know that the average number of iterations is of the order — the adaptive 

case. Therefore, to study the non-adaptive version, we first let n —)• -|-oo, and then analyze 
the behavior of the asymptotic quantities with respect to N (in the limit N —)• +oo), while 
for the adaptive version we need to pass to the limit only once, namely n —)• -|-oo. 

Clearly, by the independence properties of the random variables introduced here we have 

IE[Pn] =P- 


Moreover, it is well-known that, for a given value of N (the length of the sequence of lev¬ 
els) the asymptotic variance (when n goes to -|-oo) is minimized when for any 

i € {1,... ,N} (i.e. the conditional probabilities in (1241) are equal); moreover the asymptotic 
variance is a decreasing function of N, which converges to when N —?> -|-oo. From a 

practical point of view, the computation of the associated sequence of levels oi,..., UAf-i is a 
priori difficult: the adaptive version overcomes this issue, and in the regime N —)• -|-oo both 
the non-adaptive and the adaptive version have the same statistical properties. 

As a consequence, from now on we assume that = p^/^ for any zG {!,... ,A^}. 

For any i G {1,... ,N}, (£(pn^))^gpj* satisfies a Large Deviations Principle with the rate 
function (see (1231) 1 


’^Niu) 


+00 \iyi ( 0 , 1 ) 

yiog + (1 - y) log if y G (o, i). 


(27) 


Since for any n G N* the random variables are independent, it is easy to generalize 

this statement as follows. The sequence [C{pn\ ■ ■ ■ satisfies a Large Deviations 

Principle in with the rate function (with abuse of notation Xjv refers both to the function 
depending on a 1-dimensional or a Wdimensional variable) 


N 

^N{yi, ■ ■ ■ ,yN) = 

i=l 


(28) 


Now as a consequence of the contraction principle, since p^ = Pn \ the sequence 

also satisfies a Large Deviations Principle with the rate function 

lN{y) =inf |xAr(2/Ar,...,2/Ar) ; 2/ = • (29) 

On the one hand, it is clear that if y ^ (0,1), then /7v(y) = +oo. Indeed, for any (yi,..., y^r) 
satisfying the constraint y = 2/* ^ (0) f)) least one of the y^’s satisfies y^ ^ (0,1), which 

yields lN{yi) = 1-N{yi, ■ ■ ■ ,yn) = + 00 . 
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On the other hand, by definition of In, we have for any y E (0,1) 
lN{y) < • • •, y^^^) = 

AT^oo ~ ~ = I{y). 

For our purpose, this inequality is sufficient. 

We now interpret the previous inequality in terms of asymptotic estimates for deviations 
of and of with respect to their expected value p. Let e > 0, then we have by definition 
of the Large Deviations Principle with rate function In 

lim|^ilog(p(|p^ -p| > e)^ > - ini {lN{y) ; |y -p| > e} 

> - inf |fVXAr(?/^/^) ; |y - p| > e| 

> - min |iVXAr((p + ef/’^),NZN{{p - e)^/^)| , 

using that In is non-increasing on (—oo,p^/^) and non-decreasing on {p^^^,+oo). 

To conclude, notice that 

^lim — min ^NInUp + NIn{{p — = — min{/(p -|- e),I{p — e)} 

= lim — log^Pdp”’^ — p| > e)^. 

We can thus assess that the Adaptive Multilevel Splitting algorithm is more efficient (in 
a large sense) than the non-adaptive version in terms of large deviations when the number of 
replicas n goes to -|-oo and in the limit of large number N if levels. 


7 Proof of the technical estimates 

In this section, we give detailed proofs for the technical auxiliary results used in Section 15.21 

Proof of Provosition \5.^A We proceed by recursion, like in the proof of Proposition 6.4 in 
|BLR15| and Lemma 2 in [BGT14j . We fix the values of 1 < fc < n and of A € M. 

Differentiating recursively with respect to x, for any 0 < I < k — 1 and for any 0 < x < a 
we have (for a family of coefficients described by (I32D below) 


d} k f ^ \ 

^ (rn,fc(A;x) - 0„,fe(A;x)) = //"’ exp(^reAlog(l --)j J Tn,k{>^;y)fn,k-i{y-,x)dy 


i-i 

+ E 




m=0 


m,l 


(rn,fc(A,x) 0n,^fc(A,x)) , 


(30) 


and that differentiating once more we get 


^ (rn,fc(A;x) - 0„,fc(A;x)) = //”Agxp(^nAlog(l - ^)^r„,fc(A;x) 

jm 

+ (rn,fc(A; x) - 0n,fc(A; x )), (31) 


m=0 
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■ ,-I ri h n.k 1 n.k n.k 

With := and := 

The coefficients satisfy 


n.k 1 n.k / 7 i 7 i i ^ n.K 

fj-o =-[n-k + l + l)ni' ] 

J’oii = -{n - k + I + l)rof, if ^ > 0, 

Chi = Ch,z - (n - Z + 1)C5, l<m<l, 

n.k -I 

^i,i =-!• 


n,k ^ 


( 32 ) 


Notice that these coefficients do not depend on A, and are the same as in |BLR15| and [BGT14) . 
Properties fll7D are proved in |BLR15| . 

Thanks to (I17p . for all j G {0 ,... ,k — 1} and any x G [0,a] we have 


k-l 


exp {{n-k + j + l){x - a)) = ^ exp {{n - k + j + l)(x - a)) 

m=0 


Using the expression of Fn^k, straightforward computations show that 0n,fc(A; •) is a linear 
combination of the exponential functions 2 i—). exp(—nz),... ,exp(—(n — k -h l)^); therefore 


dx^ 


k-l 


0n,A;(U^) ^ ^ 


m=0 


jm 

,n,k_^ _ 

dx^ 


^n,kiS'i 


and thus (I3ip gives m- 


□ 


Proof of Lemma \5.4\ From (f30l) . the equality in (IlSp is clear. 
We claim that for any 0 < m < k — 1 and any 0 < ^ < A; — 1 


dx'^' 


Fn,iia-,x) - 


n 

n—>-oo 


m 


{-If. 


(33) 


In particular, -^(^Fn^i{a-,x) - Tn,£+i(a; a^)) = ^ = (T) enough as soon 

i > m. Conclusion is then straightforward: using the definition (|14p of Qn,k, we get 


as 


1 d™ 


^^ t / \ 

en,A:(A;x)|^^^ = _^^exp(nAlog(l - -))I^Fn,e{a;x) - Fn,i+i{a-,x)J\^^^ 


^=0 

m 


e=o 




= (1 — exp(—A 


We now prove (I33p by induction on m. 

We first consider m = 0. Then for any £ G N* we have Fn/{a;a) = 0 and Fnfi{a]a) = 1 
(by the convention Fnfi{y;x) = ljy>x), and (l33p holds. 

Let us also consider m = 1, when k >2. Then ^Tn,o(flj a:)|^_^ = 0, while for any x < a 




—Fn,£(a - a:;0) 


-fn,i{a - x;0) 


fn,i{(k'i x) 
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as a consequence of the absence of memory property of the exponential distribution. 

Now since /n,£(o, o) = nl£=i, we get (1331) for m = 1. 

The induction is based on the following relations (deduced from elementary computations; 
for a proof see |BLR15) . Section 6.3) 


A 

dx 

d 


fn,l{y]x) = nfn,l{y\x). 


for ^ E {2,... ,n - 1}, -^fn,i{y;x) = {n - i + 1)(/n,£(y;x) - fn,£-iiy,x)). 

Thanks to the first formula in ((31)) . we easily get (I33p for £ = 0 by induction on m. 
If now .^£{1,...,/? — 1}, we have the recursive formula for m > 1 


(34) 


jm+l , . 

[Fn,e{a; x) - Fn,e+i{a; x) j 






■ (/n,£+i(a; x) - 3;)) 


jm—1 


= (^-^) {fn,i+l{a]x) - fnAa-,x)'j\^ 

d'^~^ ( 

-(n - £ + 1) -/n,£-i(a; 

d'" / \ 

= - Pn,i+i{a;x)j\^^ 

ifn / 

-(n -e + l)-j^\^Fn,i-i{a;x) - Fn/Aix 
Finally using the induction hypothesis and obtain 

-C„,, ^(-,'(7) - J 


= (-!)' 


ifm + 1 


This concludes the proof of Lemma 15.41 


□ 


Proof of Provosition \5 . dl The r'^^(A) are the roots of the caracteristic equation associated 
with the linear ODE (fT6l) : 


(n — v)...{n — k + 1 — v) 
n...{n — k + 1) 


exp 


(^nAlog(l - ^)) = 0, 


which can be rewritten as a polynomial equation of degree k with respect to the variable 


n. — 


(1 - Un)-A - — - ^n) 


exp 


(^nAlog(l - ^)^ = 0, 


where exp^nAlog(l — ^)^ exp(—/cA). 

By continuity of the roots of polynomials of degree k with respect to the coefficients, we 
get that for all £ G {1,..., A;} (with an appropriate ordering of the roots) 




n 


^L,fc(A) 
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where (1 ~ = e~^^. This identity immediately yields (1201) . 

As a consequence, for n large enough the roots pairwise distinct. Then (|19p 

holds for some complex numbers 7 ^ where i G {1, ... ,k}. Thanks to (|19l) evaluated at 
X = a, these coefficients are solution of the following linear system of equations: 


■ TidO) + + 


TlO) + - +t 70) (7t(A))‘"" = £Gr„,t(A;x)|,„ 


(35) 


This system is equivalent with 


' ^l,kW + ••• + 7n,fc(A) - r„,fc(A;1, 

7n,fc(^K,fc(^) + - + 7n,fe(^)<fc(^) = ^%=a ^lo,kW^ 


thanks to (fT8]l and (l20l) . where k(^) ~ 




^LkW- 


n—>-+00 

It is now easy to get (|2ip , which concludes the proof of Proposition 15.61 


1 (\\k-l 

? 

(36) 

□ 


8 Conclusion and perspectives 

We have established (Theorem 13.ip a Large Deviations Principle result for the Adaptive Mul¬ 
tilevel Splitting AMS(re, fc) Algorithm in an idealized setting, when the number of replicas n 
goes to infinity while the parameter k and the threshold a remain fixed. The rate function 
does not depend on k: when k = 1, the proof is very simple and uses an interpretation of the 
algorithm with a Poisson process (the number of iterations follows a Poisson distribution). 
When A: > 1, we rely on a functional equation technique which was already used to prove 
unbiasedness and asymptotic normality of the estimator in the previous works |BLP.15) and 
|B(fT14) . 

We were able to relate the efficiency of the algorithm with this Large Deviations result, 
with a comparison with two algorithms (see Section B- a crude Monte-Carlo method and a 
non-adaptive version. More generally, in other situations Large Deviations could be a powerful 
tool to compare adaptive or non-adaptive multilevel splitting algorithms, instead of resorting 
only on comparison of asymptotic variances associated with central limit theorems. 

Let us mention a few open directions for future works. First, it should be interesting to 
look at the regime where k also goes to infinity, with k/n converging to a proportion a G (0,1). 
We expect to prove that the optimal rate function is obtained for a decreasing to 0: indeed, the 
asymptotic variance is minimized in this regime. A comparison with a non-adaptive version 
of the algorithm is expected to show that the adaptive algorithm behaves (in terms of large 
deviations) like the non-adaptive version when the number of replicas and of levels goes to 
infinity, like in the regime we have studied in this paper. 
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A severe restriction is given by the so-called idealized setting: we need to know how to 
sample according to the conditional distribution C{X\X > x). In practice, and especially 
when computing crossing probabilities for high dimensional metastable stochastic processes, 
it is not satisfied and the multilevel splitting algorithm needs to use an importance function 
to define appropriate levels, and at each step the computation of the new sample uses the one 
at the previous iteration (thanks to a branching procedure of the successful trajectories). A 
natural question is whether one can prove a Large Deviations Principle in such a framework, 
and study quantitatively how the rate function depends on the importance function. 

In fact, when using both non-adaptive (see |GKvOO^ . |GHSZ98] i and adaptive ((BGG^j 
in preparation) multilevel splitting algorithms, one may observe a very large difference between 
the value of the estimator (averaged over a number M of independent realizations) and the 
true result, or between the results obtained for different choices of the importance function. 
Even if the estimator of the probability is unbiased, in such situations one observes an apparent 
bias toward smaller values if M is not sufficiently large. This phenomenon is explained by 
specificity of the models: there are several channels to reach the region B from A (in the case 
of the estimation of crossing probabilities between metastable states of a Markov process), 
which may be sampled very differently when the importance function changes. It should be 
interesting to investigate the relation between this phenomenon and the Large Deviations 
Principle for the associated estimator. 
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